Dealing with Imbalanced Dataset Leveraging Boundary Samples Discovered by Support Vector Data Description
نویسندگان
چکیده
These days, imbalanced datasets, denoted throughout the paper by ID, (a dataset that contains some (usually two) classes where one considerably smaller number of samples than other(s)) emerge in many real world problems (like health care systems or disease diagnosis systems, anomaly detection, fraud stream based malware detection and so on) these datasets cause under-training minority class(es) over-training majority class(es), bias towards classification process application. Therefore, take focus researchers any science there are several solutions for dealing with this problem. The main aim study IDs is to resample borderline discovered Support Vector Data Description (SVDD). There naturally two kinds resampling: Under-sampling (U-S) over-sampling (O-S). O-S may occurrence over-fitting (the its drawback). U-S can significant information loss In study, avoid drawbacks sampling techniques, we on be misclassified. data points misclassified considered which border(s) between class(es). First SVDD, find examples; then, resampling applied over them. At next step, base classifier trained newly created dataset. Finally, compare result our method terms Area Under Curve (AUC) F-measure G-mean other state-of-the-art methods. We show has better results methods experimental study.
منابع مشابه
Dealing with Imbalanced Data using Bayesian Techniques
For the present work, we deal with the significant problem of high imbalance in data in binary or multi-class classification problems. We study two different linguistic applications. The former determines whether a syntactic construction (environment) co-occurs with a verb in a natural text corpus consists a subcategorization frame of the verb or not. The latter is called Name Entity Recognitio...
متن کاملClass-Boundary Alignment for Imbalanced Dataset Learning
In this paper, we propose the class-boundaryalignment algorithm to augment SVMs to deal with imbalanced training-data problems posed by many emerging applications (e.g., image retrieval, video surveillance, and gene profiling). Through a simple example, we first show that SVMs can be ineffective in determining the class boundary when the training instances of the target class are heavily outnum...
متن کاملHigh performance of the support vector machine in classifying hyperspectral data using a limited dataset
To prospect mineral deposits at regional scale, recognition and classification of hydrothermal alteration zones using remote sensing data is a popular strategy. Due to the large number of spectral bands, classification of the hyperspectral data may be negatively affected by the Hughes phenomenon. A practical way to handle the Hughes problem is preparing a lot of training samples until the size ...
متن کاملEllipse Support Vector Data Description
This paper presents a novel Boundary-based approach in one-class classification that is inspired by support vector data description (SVDD). The SVDD is a popular kernel method which tries to fit a hypersphere around the target objects and of course more precise boundary is relied on selecting proper parameters for the kernel functions. Even with a flexible Gaussian kernel function, the SVDD cou...
متن کاملSubspace Support Vector Data Description
This paper proposes a novel method for solving oneclass classification problems. The proposed approach, namely Subspace Support Vector Data Description, maps the data to a subspace that is optimized for one-class classification. In that feature space, the optimal hypersphere enclosing the target class is then determined. The method iteratively optimizes the data mapping along with data descript...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computers, materials & continua
سال: 2021
ISSN: ['1546-2218', '1546-2226']
DOI: https://doi.org/10.32604/cmc.2021.012547